The Internal Representation of LINQ Query Statements

At this point, you have been introduced to the process of building query expressions using various C# query operators (such as from, in, where, orderby, and select). Also, you discovered that some functionality of the LINQ to Objects API can only be accessed when calling extension methods of the Enumerable class. The truth of the matter, however, is that when compiled, the C# compiler actually translates all C# LINQ operators into calls on methods of the Enumerable class. In fact, if you were really a glutton for punishment, you could build all of your LINQ statements using nothing but the underlying object model.

A great many of the methods of Enumerable have been prototyped to take delegates as arguments. In particular, many methods require a generic delegate named Func<> defined within the System namespace of System.Core.dll. For example, consider the Where() method of Enumerable which is called on your behalf when you use the C# where LINQ query operator:

// Overloaded versions of the Enumerable.Where<T>() method.
// Note the second parameter is of type System.Func<>.
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,
    System.Func<TSource,int,bool> predicate)

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,
    System.Func<TSource,bool> predicate)

The Func<> delegate (as the name implies) represents a pattern for a given function with a set of arguments and a return value. If you were to examine this type using the Visual Studio 2010 object browser, you’ll notice that the Func<> delegate can take between zero and four input arguments (here typed T1, T2, T3, and T4 and named arg1, arg2, arg3, and arg4), and a return type denoted by TResult:

// The various formats of the Func<> delegate.
public delegate TResult Func<T1,T2,T3,T4,TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)

public delegate TResult Func<T1,T2,T3,TResult>(T1 arg1, T2 arg2, T3 arg3)

public delegate TResult Func<T1,T2,TResult>(T1 arg1, T2 arg2)

public delegate TResult Func<T1,TResult>(T1 arg1)

public delegate TResult Func<TResult>()

Given that many members of System.Linq.Enumerable demand a delegate as input, when invoking them, you can either manually create a new delegate type and author the necessary target methods, make use of a C# anonymous method, or define a proper lambda expression. Regardless of which approach you take, the end result is identical.

While it is true that making use of C# LINQ query operators is far and away the simplest way to build a LINQ query expression, let’s walk through each of these possible approaches just so you can see the connection between the C# query operators and the underlying Enumerable type.

Building Query Expressions with Query Operators (Revisited)

To begin, create a new Console Application named LinqUsingEnumerable. The Program class will define a series of static helper methods (each of which is called within the Main() method) to illustrate the various manners in which you can build LINQ query expressions.

The first method, QueryStringsWithOperators(), offers the most straightforward way to build a query expression and is identical to the code seen in the LinqOverArray example found earlier in this chapter:

static void QueryStringWithOperators()
{
    Console.WriteLine("***** Using Query Operators *****");

    string[] currentVideoGames = {"Morrowind", "Uncharted 2",
        "Fallout 3", "Daxter", "System Shock 2"};

    var subset = from game in currentVideoGames
        where game.Contains(" ") orderby game select game;

    foreach (string s in subset)
        Console.WriteLine("Item: {0}", s);
}

The obvious benefit of using C# query operators to build query expressions is the fact that the Func<> delegates and calls on the Enumerable type are out of sight and out of mind, as it is the job of the C# compiler to perform this translation. To be sure, building LINQ expressions using various query operators (from, in, where, or orderby) is the most common and straightforward approach.

Building Query Expressions Using the Enumerable Type and Lambda Expressions

Keep in mind that the LINQ query operators used here are simply shorthand versions for calling various extension methods defined by the Enumerable type. Consider the following QueryStringsWithEnumerableAndLambdas() method, which is processing the local string array now making direct use of the Enumerable extension methods:

static void QueryStringsWithEnumerableAndLambdas()
{
    Console.WriteLine("***** Using Enumerable / Lambda Expressions *****");

    string[] currentVideoGames = {"Morrowind", "Uncharted 2",
        "Fallout 3", "Daxter", "System Shock 2"};

    // Build a query expression using extension methods
    // granted to the Array via the Enumerable type.
    var subset = currentVideoGames.Where(game => game.Contains(" "))
        .OrderBy(game => game).Select(game => game);

    // Print out the results.
    foreach (var game in subset)
        Console.WriteLine("Item: {0}", game);
    Console.WriteLine();
}

Here, you begin by calling the Where() extension method on the currentVideoGames string array. Recall the Array class receives this via an extension method granted by Enumerable. The Enumerable.Where() method requires a System.Func<T1, TResult> delegate parameter. The first type parameter of this delegate represents the IEnumerable<T> compatible data to process (an array of strings in this case), while the second type parameter represents the method result data, which is obtained from a single statement fed into the lambda expression.

The return value of the Where() method is hidden from view in this code example, but under the covers you are operating on an OrderedEnumerable type. From this object, you call the generic OrderBy() method, which also requires a Func< > delegate parameter. This time, you are simply passing each item in turn via a fitting lambda expression. The end result of calling OrderBy() is a new ordered sequence of the initial data

Last but not least, you call the Select() method off the sequence returned from OrderBy(), which results in the final set of data that is stored in an implicitly typed variable named subset.

To be sure, this “long hand” LINQ query is a bit more complex to tease apart than the previous C# LINQ query operator example. Part of the complexity is no doubt due to the chaining together of calls using the dot operator. Here is the exact same query, with each step broken into discrete chunks:

static void QueryStringsWithEnumerableAndLambdas()
{
    Console.WriteLine("***** Using Enumerable / Lambda Expressions *****");

    string[] currentVideoGames = {"Morrowind", "Uncharted 2",
        "Fallout 3", "Daxter", "System Shock 2"};

    // Build a query expression using extension methods
    // granted to the Array via the Enumerable type.
    var subset = currentVideoGames.Where(game => game.Contains(" "))
        .OrderBy(game => game).Select(game => game);

    // Print out the results.
    foreach (var game in subset)
        Console.WriteLine("Item: {0}", game);
    Console.WriteLine();
}

As you may agree, building a LINQ query expression using the methods of the Enumerable class directly is much more verbose than making use of the C# query operators. As well, given that the methods of Enumerable require delegates as parameters, you will typically need to author lambda expressions to allow the input data to be processed by the underlying delegate target.

Building Query Expressions Using the Enumerable Type and Anonymous Methods

Given that C# lambda expressions are simply shorthand notations for working with anonymous methods, consider the third query expression created within the QueryStringsWithAnonymousMethods() helper function:

static void QueryStringsWithAnonymousMethods()
{
    Console.WriteLine("***** Using Anonymous Methods *****");
    
    string[] currentVideoGames = {"Morrowind", "Uncharted 2",
        "Fallout 3", "Daxter", "System Shock 2"};
    
    // Build the necessary Func<> delegates using anonymous methods.
    Func<string, bool> searchFilter =
        delegate(string game) { return game.Contains(" "); };
    Func<string, string> itemToProcess = delegate(string s) { return s; };
    
    // Pass the delegates into the methods of Enumerable.
    var subset = currentVideoGames.Where(searchFilter)
        .OrderBy(itemToProcess).Select(itemToProcess);
    
    // Print out the results.
    foreach (var game in subset)
        Console.WriteLine("Item: {0}", game);
    Console.WriteLine();
}

This iteration of the query expression is even more verbose, because you are manually creating the Func<> delegates used by the Where(), OrderBy(), and Select() methods of the Enumerable class. On the plus side, the anonymous method syntax does keep all the delegate processing contained within a single method definition. Nevertheless, this method is functionally equivalent to the QueryStringsWithEnumerableAndLambdas() and QueryStringsWithOperators() methods created in the previous sections.

Building Query Expressions Using the Enumerable Type and Raw Delegates

Finally, if you want to build a query expression using the really verbose approach, you could avoid the use of lambdas/anonymous method syntax and directly create delegate targets for each Func<> type. Here is the final iteration of your query expression, modeled within a new class type named VeryComplexQueryExpression:

class VeryComplexQueryExpression
{
    public static void QueryStringsWithRawDelegates()
    {
        Console.WriteLine("***** Using Raw Delegates *****");

        string[] currentVideoGames = {"Morrowind", "Uncharted 2",
            "Fallout 3", "Daxter", "System Shock 2"};

        // Build the necessary Func<> delegates.
        Func<string, bool> searchFilter = new Func<string, bool>(Filter);
        Func<string, string> itemToProcess = new Func<string,string>(ProcessItem);

        // Pass the delegates into the methods of Enumerable.
        var subset = currentVideoGames
            .Where(searchFilter).OrderBy(itemToProcess).Select(itemToProcess);

        // Print out the results.
        foreach (var game in subset)
            Console.WriteLine("Item: {0}", game);
        Console.WriteLine();
    }

    // Delegate targets.
    public static bool Filter(string game) {return game.Contains(" ");}
    public static string ProcessItem(string game) { return game; }
}

You can test this iteration of your string processing logic by calling this method within Main() method of the Program class as follows:

VeryComplexQueryExpression.QueryStringsWithRawDelegates();

If you were to now run the application to test each possible approach, it should not be too surprising that the output is identical regardless of the path taken. Keep the following points in mind regarding how LINQ query expressions are represented under the covers:

Whew! That might have been a bit deeper under the hood than you wish to have gone, but I hope this discussion has helped you understand what the user-friendly C# query operators are actually doing behind the scenes.